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A  GRADIENT  INEQUALITY  FOR  NON- DIFFERENTIABLE  FUNCTIONS 


I.  Introduction 

In  mathematical  programming  the  following  result  is  very  useful: 
THEOREM  1 

Assumption:  K  is  a  nonempty  open  subset  of  Rn  ,  ^  f  :  K  -*  R  ,  f  is 

differentiable  on  K  ,  ^  X  is  a  convex  subset  of  K  . 
Conclusion:  f  is  convex  on  X  if,  and  only  if,  f(x)  -  f(y)  >  (x  -  y)Tf'(y) 

for  all  x  ,  y  e  X  , 


where  f'(y)  denotes  the  gradient  of  f  at  y  . 

The  above  theorem  is  stated,  in  a  slightly  weaker  form,  in  [8  , 

Vol  «  1,  p.  405]  :  for  the  sake  of  completeness  we  present  a  proof  of  Theorem  2 
in  Appendix  A.  A  related  result,  extremely  useful  in  programming  theory,  and 
discussed  in  [  3  ]  is: 

THEOREM  2 

Assumption:  K  is  an  open  subset  of  Rn  ,  f  :  K  —  R  ,  f  is  differentiable 

on  K  ,  X  is  a  convex  subset  of  K  ,  f  is  convex  on  X  , 

XQ  €  X  . 

Conclusion:  f(xQ)  <  f(x)  for  all  x  e  X  if,  and  only  if,  (x  -  x0)Tf'(x0)  >  0 

for  all  x  e  X  . 


Rn  denotes  the  set  of  all  column  n-tuples  with  real  number  components, 
write  R  in  place  of  R  . 

At  each  point  of  K  ,  all  first  partial  derivatives  exist* 


we 


The  proof  of  Theorem  2  is  straightforward;  if  we  know  that 
T 

(x  -  Xq)  f'fXg)  >  0  whenever  x  €  X  then  a  direct  application  of  Theorem  1, 
with  y  =  Xg  ,  yields:  f(xg)  <  f{x)  for  all  x  C  X  .  Conversely,  if  Xq  min¬ 
imizes  f  on  X  then  for  each  x  e  X  and  X.  e  (0,  1)  ,  since  \x  +  (1  -  \)xQ  e  X  , 
we  must  have:  f(Xg)  <  f(Xx  +  (1  -  X.)xq)  =  f(xp  +  X(x  -  Xq))  .  As  a  consequence 
we  have: 

f(xQ  +  \(x  -  Xjj)  )  -  f(xQ) 

- £ -  >  0  whenever  X.  e  (0,  1)  , 

letting  \  approach  zero  we  obtain  {x  -  x0)Tf'(xQ)  >  0  .  Q.  E.D. 

We  shall  be  concerned  here  with  generalizing  Theorem  2  by  relaxing 
the  hypothesis  that  f  is  differentiable  everywhere  on  K  (and  thus  we  shall 
actually  be  able  to  omit  mentioning  K  )  ,  though  keeping  the  requirement 
that  f  be  convex  on  X  .  In  case  we  know  only  that  f  is  convex,  we  still 
wish  to  obtain  nontrivial  characterizations  of  the  fact  that  f  has  a  minimum 
on  X  and,  if  possible,  obtain  some  information  about  the  minimizing  point 
(i.  e. ,  Xq  )  .  We  will  be  able  to  obtain  such  characterizations  when  X  and  f 
are  of  a  special  form,  though  f  may  fail  to  be  differentiable. 

In  what  follows  we  shall  frequently  apply  the  so-called  Minkowski- 
Farkas  lemma  and  we  list  it  here  for  reference: 

Lemma  1:  Let  A  be  any  real  m  x  n  matrix  and  let  b  e  R?1  .  The  following 
statements  are  equivalent: 

(i)  b^ir  <  0  whenever  A^ir  <  0  ,  w  «  H?11  ,  ^ 

(ii)  there  exists  an  x  tE?  such  that  x  >  0  ,  Ax  =  b 

^A  vector  inequality  means  that  the  inequality  indicated  holds  for  each  com¬ 
ponent. 
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II.  Discussion  of  Special  Problem 

Suppose  we  are  given  an  n- tuple  a  =  (a^, •  •  • »  an)  ,  an  n  x  n  real 
symmetric  positive  semi- definite  matrix  C  and  a  real  m  x  n  matrix  A  . 

We  define: 

X  =  Rn  ^  {x  |  Ax  <  0  } 

(1)  t  i  « 

f (x)  =  ax  +  (  x  Cx  )  2  ,  for  x  e  K 

It  is  immediate  that  X  is  a  convex  cone  (it  is  in  fact  a  "finite  cone"),  also  X 
is  obviously  nonempty  because  0  c  X  .  In  addition,  it  follows  readily  from 
a  well  known  inequality  [4  ,  Theorem  1  -  (ii>  ]  that  f  is  a  convex  function 

on  F?1  . 

With  f  and  x  as  defined  in  (1)  and  xQ  e  X  we  should  like  to  obtain 
a  statement  similar  to  the  conclusion  in  Theorem  2.  Some  direct  considerations 
lead  us  to  believe  that  the  situation  is  rather  complicated!  We  note  that 
OCX  and  f(0)  =  0  ,  furthermore,  since  X  is  a  cone,  we  observe  that  if 
for  any  x  c  X  we  had  f(x)  <  0  then  indeed  f  is  unbounded  below  on  X  . 

Thus  the  only  situation  in  which  a  minimum  exists  is  when  a  minimizing  x  is 
x0  =  0  ,  and  of  course  f  does  not  have  a  well  defined  gradient  at  the  origin 

except  in  case  C  =  0  . 

Pursuing  the  above  line  of  reasoning  let  us  assume  that 
f(x)  =  ax  +  ( x*Cx  )^  >  0  whenever  x  C  X  and  that,  in  addition,  the  minimum 

"**  t 

of  f  (i.e.,  zero)  is  taken  on  at  some  point  xQ  such  that  *oC  x0  >  0  *  and 
thus  the  gradient  of  f  is  defined  at  xQ  .  Summarizing  the  preceding,  xQ 

satisfies: 

xQ  c  K?  ,  Axq  <0  ,  axQ  +  (XqC  xq  )  =  0  , 


3- 


(2) 


T 

XqC  Xq  >0  • 


Using  (2)  the  condition  that  (x  -  Xq^^Xq)  >  0  whenever  Ax  <  0  may  be 
written  as: 


(3) 


a  + 


*0° 


— T - 

<x0Cx0 


)? 


>  0 


whenever  Ax  <  0  . 


Up  to  this  point,  we  have  assumed  that  x^  satisfies  (2)  and  that  0  =  f(xp)  <  f(x) 
whenever  x  e  X  ;  we  wrote  down  (3)  as  a  statement  "to  be  contemplated"  and 
have  not  said  anything  about  the  truth  or  falsity  of  (3).  However,  it  is  quite 
trivial  to  show  that  when  xQ  satisfies  (2)  and  if  (3)  is  true  then,  necessarily, 
f(xQ)  <  f(x)  for  all  x  €  X  .  That  is,  from  the  fact  that 


(4) 


x0  Cx 


T  It  JL 

—  ^xqC  xq>2  C*  Cx  )2 


which  holds  for  any  ,  x  t  if  [5,  Lemma  1  ]  and  (3)  we  conclude  that 

•T*  JL 

X  c  B?  and  Ax  <  0  implies  f(x)  =  ax  +  (x  Cx  )2  >  0  .  The  converse  state¬ 
ment,  that  one  can  conclude  (3)  from  (2)  and  the  fact  that  f(x)  >  0  whenever 
x  e  X  follows  from  Theorem  2  and  the  fact  that  f  is  then  differentiable  and 
convex  in  some  small  neighborhood  of  .  We  have  thus  obtained  a  character- 
iaation  of  a  minimizing  Xq  which  satisfies  x^CXq  >0  .  Before  attempting  to 
dispose  of  minimizing  x^'s  satisfying  xj C  xQ  =  0  (and  thus,  by  (4),  Cxq  *  0) 
we  observe  that  (3)  is  of  the  form:  Ax  <  0  implies  dx  <  0  ,  thus  by  Lemma  1 
we  know  that  (3)  is  equivalent  to  the  statement: 

There  exists  v  €  Rm  such  that 
T  *0  ° 

tr  >  0  and  tr  A  +  a  +  — m - ±  =  0  . 

(xqCxq) 
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We  observe  next  that  if  there  exista  an  xQ  satisfying  (2)  and  (5)  then  by 

a  suitable  normalization  we  should  be  able  to  assume  x Jc  xQ  =  1  ;  specifically, 
T  •  i 

letting  z  =  (xqCxq)  ^Xq  we  obtain  from  (2)  and  (5)  the  conditions: 

z  €  R?  ,  tt  c  Rm 
(6)  Az  <  0  ^  tt  >  0 

.  v^A  +  a  +  z^  C  =0  ,  z^Cz  =  1  • 

Again,  by  using  (4),  it  is  trivial  to  show  that  if  there  exist  tr  and  z  satisfying 
(6)  then  f(x)  >  0  whenever  x  c  X  .  In  fact,  and  this  is  very  crucial  to  our 
development,  we  note  that  the  same  is  true  if  we  relax  (6)  to  read: 

*  «  R?  .  tr  e  rP1 
(6')  Az  <0  ,  ir  >  0 

irTA  +  a  +  zTC  =  0  ,  zTC«  <  1  . 

We  note  that  <6')  is  a  more  "realistic"  statement  in  view  of  the  fact  that  the  z 
needed  to  satisfy  (6)  may  yield  Cs  *  0  .  That  is,  we  should  really  like  to 
conclude  (6')  from  the  fact  that  ax  +  ( x^Cx  >  0  whenever  x  e  rP  and 
Ax  <  0  ;  this  is  indeed  the  case  and  follows  directly  from: 

THEOREM  3 

Assumption:  (i)  C  /  is  a  real  symmetric  n  x  n  matrix,  A  is  an 

m  x  n  real  matrix,  u^  c  Rf1  . 

2  T  « 

(ii)  (ux)  <  x Tx  whenever  x  *  IT  and  ux  >  0  ,  Ax  <  0  , 
Conclusion:  There  exist  s  «  R?  ,  ir  «  rf”  such  that  ir  >  0  ,  As  <  0  , 

U  a  irTA  +  ZTC  ,  ZTCz  <  1  . 

The  proof  of  Theorem  3  will  be  found  in  Appendix  B. 

*Not  necessarily  positive  semi-definite. 
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As  mentioned  above,  a  direct  consequence  of-  Theorem  3  is: 


Lemma  2:  Let  f  and  X  be  as  in  (1),  then  f(x)  >  0  for  all  x  e  X  if,  and  only 
if,  there  exist  it  and  z  satisfying  (6')» 

Proof:  The  sufficiency  of  (6')  is,  as  outlined  above,  quite  clear.  Assuming 
that  for  each  x  €  X  we  have  f(x)  >  0  and  letting  u  =  -a  ,  we  see  that  the 
assumptions  of  Theorem  3  hold  while  the  conclusion  of  Theorem  3  is 
precisely  (6'). 

III.  A  Nonhomogeaeous  Problem 

Let  us  modify  (1)  so  that  X  is  defined  by  the  inequalities  Ax  <  b  . 
rather  than  by  Ax  <  0  ,  where  b  is  a  fixed  vector  in  H?*1  .  First  of  all,  it 
need  no  longer  be  true  that  xQ  =  0  e  X  ,  in  fact  X  may  be  empty  and  to 
dispose  of  this  difficulty  we  shall  assume  the  contrary,  that  there  exists  on 
x  €  Rn  such  that  Ax  <  b  or  equivalently  (according  to  Lemma  1)  that: 

(7)  There  is  no  ir  e  R?™  such  that  rr  >  0  ,  irTA  *  0  ,  irTb  <  0 

Since  X  need  no  longer  be  a  cone,  the  minimum  of  f(x)  a  ax  +  ( x^Cx  ) ^  on 
X  (when  it  exists)  may  be  any  real  number;  let  us  assume  that  M  is  a 
lower  bound  of  f  on  X  ,  i.  e.# 

(8)  ax  +  (x^Cx  >  M  whenever  x  e  R?  and  Ax<b  . 

With  b  =  0  and  M  =  0  we  would  be  in  the  situation  discussed  in  Section  II 
with  Lemma  2  applicable.  We  shall  see  now  that  the  more  general  condition 
(8)  may  also  be  characterized  in  a  fashion  analogous  to  the  homogeneous  case. 
Towards  this  let  us  consider  the  relations: 
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Z  €  R“  ,  ireR”1  ,  X  e  R 
(9)  ir  >  0  ,  Az  <  Xb 

n-TA  +  a  +  zTC=0  ,  irTb  +  M  <  0  ,  zTCz  <  1  , 

and: 

THEOREM  4 

The  statement  (8)  is  true  if,  and  only  if,  there  exist  z  ,  ir  and  X  satisfying  (9). 
Proof:  If  ir  ,  z  ,  X  satisfy  (9)  and  x  €  B?  is  such  that  Ax  <  b  then: 

0  =  irTAx  +  ax  +  zTCx  <  irTb  +  ax  +  zTCx 

<  -M  +  ax  +  (  z^Cz  )2(  x^Cx  ^ 

<  -M  +  ax  +  ( xTCx  )2  , 

T1  I 

and  thus  M  <  ax  +  ( x  Cx  )2  . 

Conversely,  let  us  assume  that  (8)  is  true.  We  assert  that  the 
statement 


(10) 


K  x  ^  H?1  ,  rj  €  R  are  such  that 

Ax  -  nb  <  0  ,  n  >  0  then  rjM<  ax  +  ( xTCx  )?  , 


is  true.  Cle.arly,  when  tj  >  0  (10)  holds,  simply  consider  rj  ^x  and  compare 
(10)  with  (8)  which  is  assumed  true.  In  case  rj  -  0  ,  we  should  like  to 
conclude  ax  +  (xTCx  )^  >  0  from  Ax  <  0  ;  the  last  follows  from  the  facts 
that  (i)  the  statement  (7)  is  true  and  (ii)  ax  +  (xtx)1  is  convex,  homogeneous 
and  bounded  below  on  the  set  {x|Ax  <  b  >  .  We  next  note  that  (10)  is  precisely 
analogous  to  the  homogeneous  statement  in  Lemma  2,  applying  Lemma  2  we 
see  that  from  the  fact  that  (10)  holds  it  follows  that  there  must  exist  ir  ,  z  and 
X  satisfying  (9).  O.E.D. 
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IV.  Concluding  Remarks 

The  following  remarks  are  in  order: 

A.  Lemma  2  is  precisely  the  principal  theorem  of  [  5  ]  where  it  is  demonstrated 
directly  by  using  a  certain  "pseudo-norm"  on  the  set  of  sequences  of  points 
in  R?1  together  with  Lemma  1.  Here  Lemma  2  depends  ultimately  on  the 
theorem  of  Frank-Wolfe  $7]  that  any  quadratic  function  bounded  below  on 

a  polygonal  convex  set  achieves  its  minimum. 

r 

B.  A  significant  interpretation  of  Lemma  2  is  to  think  of  it  as  a  direct 
generalization  of  the  Minkowski- Farkas  theorem  (our  Lemma  1),  because 
the  former  reduces  to  the  latter  when  C  =  0  . 

C.  One  can  think  of  Lemma  2  as  expressing  the  fact  that  a  certain  convex  set 
is  closed.  The  set  T  in  question  consists  of  all  negatives  of  a's  for 
which  there  exist  z  and  tt  satisfying  (6'),  these  in  turn  are  the  tangent 
planes  of  the  convex  function  g(x)  =  ( xTCx  over  the  cone  {x|Ax  <0  }  . 

In  general,  for  arbitrary  convex  (and  even  continuous)  g  this  set  of 
tangents  need  not  be  closed;  e.g.,  if  g  exhibits  "assymptotic"  behavior, 
a  limit  of  tangents  need  not  be  a  tangent. 

D.  in  view  of  the  preceding  and  also  because  of  the  relation  of  Lemma  2 

to  Theorem  2,  it  would  be  of  interest  to  generalize  Lemma  2  to  a  larger 

class  of  pairs  (X,  f) ,  e.g.,  keeping  X  a  finite  cone  and  allowing  f  to 

P  ip  i 

take  the  form  f(x)  =  ax  +  2  (xAC.x)2,  where  each  C.  is  positive 

k=l 

semi-definite  • 
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appendix  a 

Proof  of  Theorem  1 


All  quantities  are  as  defined  in  Theorem  1.  We  first  demonstrate 
the  simpler  half  of  the  proof  that  if 

”  f(y)  (x  "  y)^f'(y)  whenever  x  ,  y  c  X  , 

then  f  is  convex.  Suppose  u  ,  v  are  elements  of  X  and  X  e  [0,  1]  ,  let 
x  =  u  and  y  =  Xu  +  (1  -  X)v  then  from  (11)  we  get: 

(12)  f(Xu  +  (1  -  X)v)  <  f(u)  -  (1  -  \)(u  -  v)Tf'(y)  . 

Similarly,  letting  x  =  v  and  y  s  Xu  -f  (1  -  X,)v  we  obtain  from  ( 1 1): 

(13)  f(Xu  +  (1  -  X)v)  <  f(v)  +  X(u  -  v)Tf'(y)  , 

Multiplying  (12)  by  X  ,  and  (13)  by  (1  -  X)  ,  then  adding,  we  obtain  the  required 

inequality. 

We  show  next  that  if  f  is  convex  on  X  then  ( 1 1)  must  hold.  First, 
we  show  the  above  true  for  n  =  1  .then  reduce  the  general  case  to  the  one 
dimensional  one.  Let  us  assume  that  K  is  an  open  subset  of  R  and  that 
f  is  differentiable  on  K  ,  then  the  convexity  of  f  on  a  convex  subset  X  of 
K  is  simply  the  statement  that: 

(14)  m  1  (^)«»>  +  (fh?)«V> 

whenever  a  <  /3  <  y  and  a  ,  y  e  X  . 

It  is  readily  seen  that  (14)  is  equivalent  to: 
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(15) 


m  -  f(*) 

P  -  at  -  Y  -  0 
whenever  a  <  fi  <  y  and  a  ,  y  e  X 


Letting  first  0  approach  y  and  then,  independently,  letting  fi  approach  a 
we  obtain  from  (15): 


\ 


-f(\>  -  1<°>  <  £'(v) 


(16) 


a  — 


!> 


f(a)  < 

—  y  -  a 


if  o  <  V  and  a  ,  y  e  X. 


which  is  in  fact  equivalent  to  (15).  Lastly,  we  note  that  (16)  is  equivalent 
to  (11).  We  just  demonstrated  that  if  n  =  1  and  f  satisfies  the  assumptions 
of  Theorem  1  then  (11)  must  hold  whenever  f  is  fconvex. 

Finally,  suppose  n  is  any  positive  integer  and  f  is  convex  on  X 
and  satisfies  the  assumptions  of  Theorem  1.  Since  K  is  open,  for  fixed 
x  ,  y  e  X  C  K  ,  there  must  exist  an  open  interval  I  containing  [0,  1] 

*y 

and  such  that  Xx  +  (1  -  X)y  is  in  K  whenever  X  is  in  I  .  Defining  the 

xy 

function  h  on  I  by 
xy  xy  7 

hw(k)  =  f(*x  +  O  -  My)  »  for  X  e  I  , 

*y  xy 

one  checks  immediately  that  h^  must  be  convex  on  [0,1]  .  Consequently, 
since  Theorem  1  holds  for  n  =  1  ,  we  know  that: 


(17) 


h  11)  -  h  (0)  >  (1  -  0)h'  (0) 

xy'  xy  —  xy' 


which  is  precisely  the  statement  (11). 


Q.E.D. 
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proof  no  use  was 


It  should  be  noted  that  in  the  first  half  of  the  above 

made  of  the  propertie,  of  the  gradient  of  f  ,  thus,  in  fact,  the  following  theorem 
is  true. 

THEOREM  5 

Let  f  ,  K  and  X  satisfy  the  assumption  of  Theorem  1  ,  then  f  is  convex 
on  X  if,  and  only  if,  there  exists  a  function  G  :  K  —  R?  and  such  that 

f(x)  -  f(y)  >  (x  -  y)^G(y)  for  all  x  ,  y  e  X  . 

Similarly, one  notes  that  the  following  is  true: 

THEOREM  6 

Let  K  be  a  nonempty  open  convex  subset  of  Rn  ,  f  :  K  —  R  ,  then  f  is 

convex  and  differentiable  on  K  if,  and  only  if,  there  exists  a  continuous 
function  G  :  K  —  Rn  such  that 

f(x)  "  >  <x  “  y)T  G(y)  for  all  X  ,  y  e  K  . 
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Consider  the  linear  inequalities: 


APPENDIX  B 
Proof  of  Theorem  3 


(18) 


z  e  Rn  ,  ir  e  Rm 

T  T 

ir  >  0  ,  Az  <0  ,  u  =  A  ir  +  Cz  , 


and  let  P  consist  of  all  ordered  pairs  (ir,  z)  satisfying  (18).  We  show  first  that 
P  is  nonempty.  Since  (18)  represents  a  system  of  linear  inequalities,  the  fact 
that  P  is  empty  is  equivalent,  by  Lemma  1,  to  the  fact  that  there  exist  x  and 
y  satisfying: 


(19) 


X  e  Rn  ,  y  e  Rm 

y  >  0  ,  Ax  <  0  ,  Cx  =  A^ y  ,  ux  >  0  . 


Thus,  if  P  were  empty,  we  would  get  from  ( 19): 

xTCx  =  irTAx  <  0  , 

T 

consequently  x  Cx  <  0  which  together  with  ux  >  0  and  Ax  <  0  contradicts 
the  assumption  of  Theorem  3.  Thus  P  is  nonempty. 

T 

Next,  if  for  some  (ir,  z)  e  P  we  had  z  Cx  <  1  then  we  would  have  the 

T 

desired  conclusion  of  Theorem  3;  assume  that  z  Cz  >  1  whenever  (ir,  z)  e  P. 
Applying  the  result  in  [7]  that  a  quadratic  function  defined  on  a  polyhedral  convex 
set  attains  its  minimum,  we  know  that  there  exist  (itq,  Zq)  €  P  such  that 
1  <  ZqCz0  <  z  Cz  for  each  (it,  z)  e  P  .  Furthermore,  it  is  clear  that 

(zqC)zq  (ZqC)z 


(20) 


whenever  (ir,  z)  c  P 


Applying  Lemma  1  to  (20)  (essentially,  making  use  of  duality  in  linear 
programming),  we  see  that  there  must  then  exist  ir  and  x  satisfying: 


x  €  Rn  , 

Tf  f 

(21) 

Ax  <0  , 

TT  >  0  , 

Az 

T  aT 

U  =  A  ^0 

+  Cz0  , 

Cx 

0  - 


,  ux  =  z0Cz0 


The  following  relations  then  are  consequences  of  (21): 


T  T  t  »t» 

zo  c  z0  =  z0  Cx  "  z0  Ai7r 

T  T 

=  x  CzQ  -  tt  1  Azq 

=  ux  "  '"’o  Ax  "  *T  Az0 

>  ux 


As  a  result,  ttJax  =  irTAzQ  =  0  and  z^C  zQ  =  ux  *  x1CzQ  ,  However, 
Ax  <  0  and  ux  >  1  ;  thus  by  assumption  in  Theorem  3  we  have 
0  <  (ux)2  <  xTCx  and: 

(xTCx  )^  >  ux 

=  zo  c  z0 
*  xTczo 

T  T 

=  X  Cx  •  I  Ax 


>  x  Cx 
T, 


From  the  last  relations  we  get  xTCx  <  1  and  thus  ux  <  (xTCx)2  <  1  , 

a  contradiction.  _  _  _ 

Q.  E.  D. 
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